Temporal Models and Smoothing
inlabru shortcutsIn the slides we have fitted the model as:
but in the practical you have
bru_obs() function..it can be defines also inside the function!y ~ . just means “take all the components and sum them together”. It also tells the bru() function that your predictor is linear.predict() and generate() functionsIn the practicals you have used the predict() function to get results from the fitted model.
predict() simulates from the fitted posterior distribution \(\pi(\mathbf{y}|\mathbf{u},\theta)\), computes what you ask and produces a summary of the samples (mean, sd, quantiles, etc)
generate() simulates from the fitted posterior distribution \(\pi(\mathbf{y}|\mathbf{u},\theta)\), computes what you ask and return the samples!
Let’s fit the model for the Tokyo rainfall data \[ \begin{aligned} y_t|\eta_t&\sim\text{Bin}(n_t, p_t),\qquad i = 1,\dots,366\\ \eta_t &= \text{logit}(p_t)= \beta_0 + f(\text{time}_t) \end{aligned} \]
We now want to extract results…what do we want?
We can get all of them with the predict() or generate() functions!
predict()or
y n time mean sd q0.025 q0.5 q0.975 median sd.mc_std_err mean.mc_std_err
1 0 2 1 0.174 0.087 0.057 0.156 0.388 0.156 0.003 0.003
2 0 2 2 0.172 0.082 0.059 0.156 0.369 0.156 0.003 0.003
3 1 2 3 0.169 0.077 0.062 0.157 0.347 0.157 0.002 0.003
predict()generate() time_eff lin_pred probs
1 -0.3865601 -1.560139 0.1736267
2 -0.4737201 -1.647299 0.1614743
3 -0.5418052 -1.715384 0.1524666
generate()Both this things require a little more thinking when using inlabru than when using lm or similar
Let’s look at the mtcars dataset that is in R
mpg cyl disp hp drat wt qsec vs am gear carb
Mazda RX4 21.0 6 160 110 3.90 2.620 16.46 0 1 4 4
Mazda RX4 Wag 21.0 6 160 110 3.90 2.875 17.02 0 1 4 4
Datsun 710 22.8 4 108 93 3.85 2.320 18.61 1 1 4 1
Hornet 4 Drive 21.4 6 258 110 3.08 3.215 19.44 1 0 3 1
Hornet Sportabout 18.7 8 360 175 3.15 3.440 17.02 0 0 3 2
Valiant 18.1 6 225 105 2.76 3.460 20.22 1 0 3 1
We want to fit a model where rank is the only covariate.
gear has three categories: 3 4``5
How do we do this in inlabru?
model.matrix() function in R (Intercept) gear4 gear5
Mazda RX4 1 1 0
Mazda RX4 Wag 1 1 0
[1] 16.103112 8.414875 5.253896
We want to fit a model where rank is the only covariate.
rank has three categories: Prof AsstProf``AssocProf
How do we do this in inlabru?
df$gear_id = as.numeric(df$gear) # inlabru needs values like 1,2,3 for random effects
cmp2 = ~ -1 + gear_effect(gear_id, model = "iid", fixed = T,initial = -6)
# or: cmp2 = ~ Intercept(1) + gear_effect(gear_id, model = "iid", fixed = T,initial = -6, constr = T)
lik2 = bru_obs(formula = mpg ~.,
data = df)
fit2 = bru(cmp2, lik2)
fit2$summary.random$gear_effect$mean[1] 16.04861 24.42290 21.15058
What is happening here??
Say we want to fit this model
(Intercept) gear4 gear5 vs1 gear4:vs1 gear5:vs1
15.050000 5.950000 4.075000 5.283333 -1.043333 5.991667
Here the best solution is to use the model.matrix() again
(Intercept) gear4 gear5 vs1 gear4:vs1 gear5:vs1
1 1 0 0 0 0
df = cbind(df, cov_effect[,-1]) # attach this to your data
cmp3 = ~ Intercept(1) + gear_4(gear4, model = "linear") +
gear_5(gear5, model = "linear") + vs_1(vs1, model = "linear") +
gear4_vs1(`gear4:vs1`, model = "linear") + gear5_vs1(`gear5:vs1`, model = "linear")
lik3 = bru_obs(formula= mpg ~ ., data = df)
fit3 = bru(cmp3, lik3)
fit3$summary.fixed$mean[1] 15.0434617 5.8986234 4.0890402 5.2876293 -0.9880579 5.8809788
(Intercept) gear4 gear5 disp gear4:disp gear5:disp
24.51556635 15.05196338 7.14538044 -0.02577046 -0.09644222 -0.02500467
Option 1: Again we can use the model.matrix() option:
(Intercept) gear4 gear5 disp gear4:disp gear5:disp
1 1 0 160 160 0
df = cbind(df, cov_effect[,-1]) # attach this to your data
cmp4 = ~ Intercept(1) + gear_4(gear4, model = "linear") +
gear_5(gear5, model = "linear") + disp(disp, model = "linear") +
gear4_disp(`gear4:disp`, model = "linear") + gear5_disp(`gear5:disp`, model = "linear")
lik4 = bru_obs(formula= mpg ~ ., data = df)
fit4 = bru(cmp4, lik4)
fit4$summary.fixed$mean[1] 24.50105803 14.96909092 7.11470263 -0.02572924 -0.09575832 -0.02486880
(Intercept) gear4 gear5 disp gear4:disp gear5:disp
24.51556635 15.05196338 7.14538044 -0.02577046 -0.09644222 -0.02500467
Option 2: fixed effects are random effect with fixed precision!
df$gear_id = as.numeric(df$gear) # inlabru needs values like 1,2,3 for random effects
cmp5 = ~ -1 + gear_effect_int(gear_id, model = "iid", fixed = T,initial = -6) +
gear_effect_slope(gear_id, disp, model = "iid", fixed = T,initial = -6)
lik5 = bru_obs(formula = mpg ~.,
data = df)
fit5 = bru(cmp5, lik5)
fit5$summary.random$gear_effect_int$mean[1] 24.15667 38.93827 31.16922
[1] -0.02475096 -0.11752709 -0.04884811
Again..the differences are a matter of parametrization!
Data are often observed in time, and time dependence is often expected.
Note: We can use the same model to smooth covariate effects!
Smoothing of the time effect
Prediction
We can “predict” any unobserved data, not only future data, e.g. gaps in the data etc.
Time can be indexed over a
Discrete domain (e.g., years)
Continuous domain
Time can be indexed over a
Discrete domain (e.g., years)
Main models: RW1, RW2 and AR1
Note: RW1 and RW2 are also used for smoothing covariates
Continuous domain
Goal we want understand the pattern and predict into the future
Random walk models encourage the mean of the linear predictor to vary gradually over time.
They do this by assuming that, on average, the time effect at each point is the mean of the effect at the neighbouring points.
Random Walk of order 1 (RW1) we take the two nearest neighbours
Random Walk of order 2 (RW2) we take the four nearest neighbours
Idea: \(\longrightarrow\ u_t = \text{mean}(u_{t-1} , u_{t+1}) + \text{Gaussian error with precision } \tau\)
Definition
\[ \pi(\mathbf{u} \mid \tau) \propto \exp\!\left( -\frac{\tau}{2} \sum_{t=1}^{T-1} (u_{t+1} - u_t)^2 \right) = \exp\!\left(-\tfrac{1}{2} \, \mathbf{u}^{\top} \mathbf{Q}\ \mathbf{u}\right) \]
\[ \mathbf{Q} = \tau \begin{bmatrix} 1 & -1 & & & & \\ -1 & 2 & -1 & & & \\ & & \ddots & \ddots & \ddots & \\ & & & -1 & 2 & -1 \\ & & & & -1 & 1 \end{bmatrix} \]
Idea: \(\longrightarrow\ u_t = \text{mean}(u_{t-1} , u_{t+1}) + \text{Gaussian error with precision } \tau\)
Definition
\[ \pi(\mathbf{u} \mid \tau) \propto \exp\!\left( -\frac{\tau}{2} \sum_{t=1}^{T-1} (u_{t+1} - u_t)^2 \right) = \exp\!\left(-\tfrac{1}{2} \, \mathbf{u}^{\top} \mathbf{Q}\ \mathbf{u}\right) \]
\(\tau\) says how much \(u_t\) can vary around its mean
We need to set a prior distribution for \(\tau\).
A common option is the so called PC-priors
inlabru for many model parametersThey are built with two principles in mind:
A line is the base model
We want to penalize more complex models
PC prior are easily available in inlabru for many model parameters
They are built with two principle in mind:
\[ \begin{eqnarray} \sigma = \sqrt{1/\tau} \\ \text{Prob}(\sigma>U) = \alpha;\\ \qquad U>0, \ \alpha \in (0,1) \end{eqnarray} \]
\(U\) an upper limit for the standard deviation and \(\alpha\) a small probability.
\(U\) a likely value for the standard deviation and \(\alpha=0.5\).
The Model
\[ \begin{aligned} y_i|\eta_i, \sigma^2 & \sim \mathcal{N}(\eta_i,\sigma^2)\\ \eta_i & = \beta_0 + f(t_i)\\ f(t_1),f(t_2),\dots,f(t_n) &\sim \text{RW2}(\tau) \end{aligned} \]
RW1 defines differences, not absolute levels:
Only the changes between neighbouring terms are modelled.
Mathematically, \[ (u_1,\dots,u_n)\text{ and }(u_1+a,\dots,u_n+a) \] produce identical likelihoods — they’re indistinguishable.
This means:
so parameters are not well-defined.
Solution:
[1] "FIT1 - Intercept"
mean sd 0.025quant 0.5quant 0.975quant mode kld
Intercept 174.138 0.001 174.135 174.138 174.141 174.138 0
[1] "FIT2 - Intercept"
mean sd 0.025quant 0.5quant 0.975quant mode kld
Intercept 0 31.623 -62.009 0 62.009 0 0
[1] "FIT1 - RW1 effect"
ID mean sd 0.025quant 0.5quant 0.975quant mode kld
1 1918 -0.122 0.014 -0.150 -0.122 -0.094 -0.122 0
2 1919 0.040 0.014 0.011 0.040 0.067 0.040 0
[1] "FIT2 - RW1 effect"
ID mean sd 0.025quant 0.5quant 0.975quant mode kld
1 1918 174.017 31.623 112.008 174.017 236.025 174.017 0
2 1919 174.177 31.623 112.168 174.177 236.185 174.177 0
\[ u_t = \text{mean}(u_{t-2} ,u_{t-1} , u_{t+1}, u_{t+2} ) + \text{some Gaussian error with precision } \tau \]
RW2 are smoother than RW1
The precision has the same role as for RW1
cmp1 = ~ Intercept(1) +
time(year, model = "rw1",
scale.model = T,
hyper = list(prec =
list(prior = "pc.prec",
param = c(0.3,0.5))))
cmp2 = ~ Intercept(1) +
time(year, model = "rw2",
scale.model = T,
hyper = list(prec =
list(prior = "pc.prec",
param = c(0.3,0.5))))
lik = bru_obs(formula = Erie~ .,
data = lakes)
fit1 = bru(cmp1, lik)
fit2 = bru(cmp2, lik)NOTE: the scale.model = TRUE option scales the \(\mathbf{Q}\) matrix so the precision parameter has the same interpretation in both models.
RW models are discrete models
Covariates are often recorded as continuous values
The function inla.group() will bin covariate values into groups (default 25 groups)
inla.group(x, n = 25, method = c("cut", "quantile"))
Two ways to bin
cut (default) splits the data using equal length intervalsquantile uses equi-distant quantiles in the probability space.The data are derived from an anthropometric study of 892 females under 50 years in three Gambian villages in West Africa.
Age - Age of respondent (continuous)
triceps - Triceps skinfold thickness.
Latent effects suitable for smoothing and modelling temporal data.
One hyperparameter: the precision \(\tau\)
It is an intrinsic model
The precision matrix \(\mathbf{Q}\) is rank deficient
A sum-to-zero constraint is added to make the model identifiable!
RW2 models are smoother than RW1
Definition
\[ u_t = \phi u_{t-i} + \epsilon_t; \qquad \phi\in(-1,1), \ \epsilon_t\sim\mathcal{N}(0,\tau^{-1}) \]
\[ \pi(\mathbf{u}|\tau)\propto\exp\left(-\frac{\tau}{2}\mathbf{u}^T\mathbf{Q}\mathbf{u}\right) \]
with
\[ \mathbf{Q} = \begin{bmatrix} 1 & -\phi & & & & \\ -\phi & (1+\phi^2) & -\phi & & & \\ & & \ddots & \ddots & \ddots & \\ & & & -\phi & (1+\phi^2) & -\phi \\ & & & & -\phi & 1 \end{bmatrix} \]
The AR1 model has two parameters
The AR1 model has two parameters
The precision \(\tau\)
pc.prec \[
\text{Prob}(\sigma > u) = \alpha
\]The autocorrelation (or persistence) parameter $(-1,1)
pc.cor0\[ \begin{eqnarray} \text{Prob}(|\rho| > u) = \alpha;\\ -1<u<1;\ 0<\alpha<1 \end{eqnarray} \]
The AR1 model has two parameters
The precision \(\tau\)
pc.prec \[
\text{Prob}(\sigma > u) = \alpha
\]The autocorrelation (or persistence) parameter $(-1,1)
pc.cor1\[ \begin{eqnarray} \text{Prob}(\rho > u) = \alpha;&\\ -1<u<1;\qquad &\sqrt{\frac{1-u}{2}}<\alpha<1 \end{eqnarray} \]
The Model
\[ \begin{aligned} y_t|\eta_t & \sim \text{Poisson}(\exp(\eta_t))\\ \eta_t & = \beta_0 + u_t\\ 1.\ u_t&\sim \text{RW}1(\tau)\\ 2.\ u_t&\sim \text{AR}1(\tau, \phi)\\ \end{aligned} \]
Number of serious earthquakes per year
hyper = list(prec = list(prior = "pc.prec", param = c(1,0.5)))
cmp1 = ~ Intercept(1) + time(year, model = "rw1", scale.model = T,
hyper = hyper)
cmp2 = ~ Intercept(1) + time(year, model = "ar1",
hyper = hyper)
lik = bru_obs(formula = quakes ~ .,
family = "poisson",
data = df)
fit1 = bru(cmp1, lik)
fit2 = bru(cmp2, lik)Estimated trend
Predictions
RW1 models can be seen as a limit for AR1 models
They are not too different when smoothing, but can give different predictions.
Sometimes the data are not collected at discrete time points but over continous time
One solution (not a bad one usually) is to discretize the time and use RW model (AR models are then harder to justify..)
A different solution is to use a continuous smoother based on a continuous Gaussian Random Field (GRF) \(\omega(t)\)
What is this?
\[ (\omega(t_1),\omega(t_2),\dots,\omega(t_n))\sim\mathcal{N}(\mathbf{0},\mathbf{\Sigma}_{\omega(t_1),\omega(t_2),\dots,\omega(t_n)} ), \qquad \text{ for any } t_1,t_2,\dots,t_n\in\mathcal{T} \] Where \(\mathbf{\Sigma}_{\omega(t_1),\omega(t_2),\dots,\omega(t_n)}\) is a variance-covariance matrix.
To define \(\omega(t), t\in\mathcal{T}\), we have to define \(\Sigma\)
BUT…
Covariance matrices are not nice objects!
We define a (Matern) GRF as the solution of a stochastic partial differential equation (SPDE)
\[ (\kappa^2-\Delta)^{\alpha/2}\omega(t) = W(t) \]
What is this?
Think of white noise as someone randomly tapping along a rope.
But the rope has tension and stifness
The parameters:
Tension = controls how far randomness propagates (\(\kappa\))
Stiffness = controls how smooth the field becomes (\(\alpha\))
Ok…but we still need to solve the SPDE to find \(\omega(t)\)!
Now we need to discretize the domain into T points (we cannot compute on the continuous!)
We represent our solution as
\[ \omega(t) = \sum_{i = 1}^T\phi_i(t)w_i \]
Where
\[ \omega(t) = \sum_{i = 1}^T\phi_i(t)w_i \]
The weights vector \(\mathbf{w} = (w_1,\dots,w_N)\) is Gaussian with a sparse precision matrix \(\longrightarrow\) Computational convenience
The field has two parameters
These parameters are linked to the parameters of the SPDE
We need to assign prior to them
Remember:
For the range we have: \[ \text{Prob}(\rho<\rho_0) = p_{\rho} \]
For the marginal standard deviation we have: \[ \text{Prob}(\sigma<\sigma_0) = p_{\sigma} \]
If you want to use a continous model for time (or for smoothing a covariate) you need to:
Let’s go back to the Triceps skinfold thickness example.
The Model
\[ \begin{aligned} y_i|\eta_i, \sigma^2 & \sim \mathcal{N}(\eta_i,\sigma^2)\\ \eta_i & = \beta_0 + \omega(\text{age}_i)\\ \omega(\text{age}_i) & \sim \text{GF}(\rho_{\omega},\sigma_{\omega}) \end{aligned} \]
The code
#|
locs = seq(-5,60,5)
mesh1d = fm_mesh_1d(locs, degree = 2, boundary = "neumann")
spde <- inla.spde2.pcmatern(mesh1d,
prior.range = c(10, 0.5),
prior.sigma = c(1, 0.5))
cmp <- ~ Intercept(1) + field(age, model = spde)
lik_spde = bru_obs(formula = triceps ~.,data = triceps)
fit_spde = bru(cmp, lik_spde)